SDA 3.5 Documentation for SUBSET
NAME
subset - Make a subset of an SDA dataset
USAGE
subset -b filename
DESCRIPTION
The subset program can create a data file that contains only a
subset of the variables and/or cases in an SDA dataset. The
program also generates a matching
DDL
file.
The output data file is an ASCII fixed-column file with one
record per case, with an optional delimiter (blank or comma)
between variables on the same record. A header record with
variable names is also output, if the selected output format is a
comma separated values (CSV) file. For more information on the
various output data formats, see the appropriate section in the
online help file
for the subset program.
Ordinarily this program is invoked by the Web interface for the
SDA programs, and the user does not have to deal with the
keywords given in this document. However, it is possible to run
this procedure directly in batch mode, by preparing a command
file which specifies the variables to be included in the subset
and the options to use. This document explains how to prepare
such a file.
Meaning of the ‘-b’ flag
- -b filename
- The name of the batch command file is specified to the
program after the ‘-b’ flag.
KEYWORDS
The batch command file contains specifications for the subset.
These specifications are given in the form "keyword = something"
with one keyword per line. Keywords may be given in any order,
either in upper or in lower case. The valid keywords are as
follows (with significant characters shown in capital letters):
=============================
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
Specify ONE of the following two data sources
STUdy= path of dataset directory Required for subset from
(can be repeated) SDA dataset
INDATa= path of ASCII data file No subset from ASCII file
INDDL= path of DDL file No subset from ASCII file
(specify both data and DDL)
Other specifications
Filter= name(s) and codes of filter No filter
variable(s)
VARList= path of file with list of REQUIRED
variables to output
TYPE= type of data file to produce TEXT
-TEXT: text file, no blanks
-TEXTBL: text file with a
blanks between vars
-CSV: comma separated values
OUTDATa= filename for output data outdata.txt
(overwrite existing file)
OUTDDL= filename for output DDL outddl.txt
(overwrite existing file)
WEBMSGfile= filename to capture output No record of user messages
displayed to Web users of
the subset procedure
NOTES
- Source of the data
The source of the data to subset may be specified EITHER as:
- an SDA dataset (or multiple SDA datasets, if there are
recoded and computed variables), OR
- a text (ASCII) data file plus a DDL file.
- Order of the variables
The variables to be included in the subset are output to the data
file in the order specified in the ’varlist’ file.
- Codebook
When the subset procedure is run from the Web interface, a simple
ASCII codebook can be produced, because the Web version can
invoke the SDA
xcodebk
program, together with the generated
DDL
file. The batch subset procedure does not do that automatically,
but you can run the ‘xcodebk’ program yourself in batch mode, if
you need a codebook.
- Data definitions for SAS, SPSS, and Stata
When the subset procedure is run from the Web interface, data
definitions or metadata for SAS, SPSS, Stata, and DDI (version 2)
can be produced, because the Web version can invoke the SDA
ddltox
program, together with the generated
DDL
file. The batch subset procedure does not do that automatically,
but you can run the ‘ddltox’ program yourself in command-line
mode, if you need data definitions for one or more of those
programs.
EXAMPLE
Simple example of a batch command file
The SDA dataset is in the directory ’/sa/sdatest’. The results
will overwrite the output data and DDL files if the files already
exist.
study = /sa/sdatest
varlist = mylist.txt
outdata = mydata.txt
outddl = myddl.txt
Example of a file containing a variable list
(Variable names are separated by spaces, commas, or are on new
lines.)
CASEID age educ gender
spend, spend2, spend3 spend4
CSM, UC Berkeley
April 12, 2011